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ESTIMATING A CONCAVE DISTRIBUTION FUNCTION FROM 
DATA CORRUPTED WITH ADDITIVE NOISE 

By Geurt Jongbloed and Frank H. van der Meulen 
Delft University of Technology 



We consider two nonparametric procedures for estimating a con- 
cave distribution function based on data corrupted with additive 
noise generated by a bounded decreasing density on (0,oo). For the 
maximum likelihood (ML) estimator and least squares (LS) estima- 
tor, we state qualitative properties, prove consistency and propose 
a computational algorithm. For the LS estimator and its derivative, 
we also derive the pointwise asymptotic distribution. Moreover, the 
rate n~'^^^ achieved by the LS estimator is shown to be minimax for 
estimating the distribution function at a fixed point. 



1. Introduction. Let Xi, X2, ... be an i.i.d. sequence of random variables 
with unknown distribution function F. Moreover, let e\^e2^--- be an i.i.d. 
sequence of random variables, independent of the Xj's, with known proba- 
bility density function k. We want to estimate the distribution function F, 
based on data Zi, Z2, . . . , Z„, where = Xj -|- Sj. In other words, we wish 
to estimate F based on a sample from the density 



Since is the convolution of the unknown distribution function with the 
(known) density /c, the problem of estimating aspects of the distribution 
function F based on a sample from gp is known as a deconvolution problem. 

Deconvolution problems were studied quite extensively during the past 
two decades. Given a class T of distribution functions F ^ one can qualita- 
tively state that the smoother the noise density A:, the worse the optimal 
estimation rate for F . See Fan (1991). Alternatively, given a noise density 
A;, it is obvious that the smaller the class of distribution functions T ^ the 
better the optimal estimation rate for F . 
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One popular approach to this estimation problem is based on kernel 
smoothing and Fourier methods [see, e.g., Carroll and Hall (1988) and De- 
laigle and Hall (2006)]. These estimators can achieve optimal rates of con- 
vergence under a wide range of smoothness assumptions. A characteristic 
feature of this approach is the need for a bandwidth, preferably chosen 
in an asymptotically optimal way. Many methods have been developed to 
determine such a bandwidth [see, e.g., Stefanski and Carroll (1990) and De- 
laigle and Gijbels (2004)]. Another popular approach is based on wavelets 
[see, e.g., Pensky and Vidakovic (1999)]. For both Fourier inversion methods 
and wavelet methods it is difficult to incorporate shape constraints on the 
distribution of interest in the estimation procedure. For example, density 
estimates can easily become negative. 

Another method that can be employed to estimate the distribution func- 
tion F is maximum likelihood. Based on the density (1) of Zj, the log like- 
lihood of a density g (or equivalent distribution function F) is easily com- 
puted. A maximum likelihood estimator is then defined as the maximizer 
of the log likelihood function over an appropriate class of distribution func- 
tions. See, for example, Groeneboom and Wellner (1992) for the case where 
it is maximized over the class of all distribution functions on [0, oo). Another 
general method to estimate F is least squares. Based on a naive estimator of 
F outside the class J- of distribution functions of interest, this estimator is 
defined as the minimizer of the L2 distance to this naive estimator over the 
class of interest. Typically, maximum likelihood and least squares estimators 
do not require a bandwidth. Moreover, shape constraints can quite naturally 
be imposed on the estimator by restricting the feasible set of distribution 
functions in their definition. This in contrast to the aforementioned kernel 
and wavelet based methods of estimation. 

In this paper we estimate the distribution function F under the assump- 
tion that it is concave. More precisely, we assume F to belong to the class 

(2) T := {-^1-^ is a concave distribution function on [0, 00)}. 

We restrict the convolution kernel k to the class of convolution kernels 

fC = {k: [0, 00) [0, 00) : A; is a 

(3) 

bounded and decreasing probability density}. 

However, as pointed out in side remarks, the existence, characterization and 
consistency results for the maximum likelihood estimator can be extended 
to more general classes of kernel functions at the cost of extra technicalities. 

Our initial motivation to study nonparametric estimators for shape-con- 
strained distribution functions in deconvolution models was the financial 
application studied in Jongbloed, van der Meulen and van der Vaart (2005) . 



ESTIMATING A CONCAVE DISTRIBUTION FUNCTION 



3 



There, we find the problem of recovering a unimodal distribution from data 
corrupted with additive noise with a smooth density. The current setting 
with decreasing kernel k is too restrictive to be applicable in that context. 
However, in this simplified model we can obtain asymptotic results for the LS 
estimator. These are of independent interest. To our knowledge, this paper is 
the second setting where the so-called Groeneboom distribution described in 
Groeneboom, Jongbloed and Wellner (2001a) appears in the limit. The first 
setting is that of estimating a convex decreasing density studied in Groene- 
boom, Jongbloed and Wellner (2001b). In both situations, the rescaling rate 
of the estimator is n"^^^. We expect that the role played by Chernoff's dis- 
tribution [Chernoff (1964)] in situations with cube root n asymptotics [Kim 
and Pollard (1990)] is played by the Groeneboom distribution in situations 
with n^/^ asymptotics. Examples of other estimation problems where we ex- 
pect this to happen are that of estimating a log concave density [Diimbgen 
and Rufibach (2004)] and that of estimating a concave distribution function 
from current status data. (We conjecture that the maximum likelihood esti- 
mator has the same asymptotics as the least squares estimator in the setting 
of this paper.) 

In Section 2 we define two nonparametric estimators for the concave dis- 
tribution function F: the maximum likelihood estimator and a least squares 
estimator. The consistency of both estimators is proved in Section 3. Com- 
putational issues of the estimators are addressed in Section 4. Subsequently, 
we derive an asymptotic local minimax lower bound on the optimal esti- 
mation rate for F{xq) and /{xq) in Section 5. In Section 6 we derive the 
asymptotic distribution of the random vector {Fn{xo) , fn{xo)) . It turns out 
that the asymptotic variance of the LS estimator depends on the functions 
k and / in exactly the same way as the minimax lower bound of Section 5. 

2. Two nonparametric estimators: definition and properties. In this sec- 
tion we define two nonparametric estimators for F: the maximum likelihood 
(ML) and least squares (LS) estimators. In the context of convex density 
estimation, Groeneboom, Jongbloed and Wellner (2001b) show that the ML 
and LS estimators have the same asymptotic pointwise behavior. The least 
squares estimator, however, is much more tractable to study both from an 
algorithmic and asymptotic point of view. The same phenomenon will be 
seen to occur in the deconvolution setting of this paper. 

2.1. Maximum likelihood. Let 

(4) Zn = {Z^,...,Zn} 

be the set of observations. Denoting by G„ the empirical distribution func- 
tion of Zn, the log-likelihood function evaluated at a distribution function 
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F is given by 
(5) ln{F)= I log gpiz) dGn{z) 



where gp is defined as the convolution of k and F: gp{z) = S[q oo)^i^ ~ 
x)dF{x). In Groeneboom and Wellner (1992) it is shown that the maxi- 
mizer of this function over the class of all distribution functions is a discrete 
distribution function with mass concentrated at the observed data points. 
We show that the maximum likelihood estimator of a concave distribution 
function F, based on a sample of size n from gp, is a proper piecewise linear 
distribution function that can only have changes of slope at the observed 
data points. We also establish a characterization of the estimator in terms 
of inequalities. 

Define the set J^basis '■= {Fe I ^ > 0} by 

(6) Fe{x) = '^\^^e]{x) + \e,oo). ^>0(xGM), 

that is, Fe is the distribution function of a uniformly distributed random 
variable on [0, 6] . Any F £ can be written as a mixture of elements from 
^basis'- there exists a probability measure fi = fip on [0,oo) such that F = 
j[o^^)Fediip{e). In fact, diip{e) = -9dF'{9). This implies 



gp{x)= / k{x - u)dFe{u)dfip{6) = gg{x) dfip{9), 

J[0,oq) J[0,od) J[0,oo) 



where 



(7) 



ge{x) := k{x-u)dFe{u) 

J[0,oo) 

= \{K{x)-K{x-e)), 0>O(xG]R). 



{K denotes the primitive ol k.) Thus we can reformulate the maximum- 
likelihood problem as to maximize ln{9) = / logg'(x) dGn(x) over Q, where 

Q '■= { 9\g{') = / ge{')dfi{6) for some probability measure fi on [0, oo) >. 

I ^[0,oo) J 

Once we know the mixing probability measure fin corresponding to the 
maximizer g^, the maximum-likelihood estimator for F is given by F^. = 
jFedfiniO). 

Theorem 2.1. Let k £ IC as defined in (3). Then a maximizer Fn of 
(5) over the class of all concave distribution functions on [0, oo) exists and 
can be chosen to be a piecewise linear distribution function with bend points 
concentrated on the set of observations Z^- 
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Proof. We start by showing that if Fn exists, there is a version that 
is piecewise hnear with bend points concentrated on {Zi, . . . , Zn}- Consider 
an arbitrary concave distribution function F and its Hnearly interpolated 
version (between the observed Zj's) F. Then, writing Z(o) = 0, we get for 
each i 



j=i"'-^(i-i) 



y)dF{y) 



(8) 

rZ 



/ kiZ^^)-y)dF{y)=gp{Z^i^) 



i=l-'^0-i) 

implying that ln{F) < ln{F). Inequality (8) holds because we can write for 
each summand (treating the ^(j)'s as fixed and denoting the distribution of 
a uniformly distributed random variable U on [0, 1] by J) 



A.(% - y) dF{y) = Epk{Z(^,) - y)l(Z(,_,),Z(,)](>') 

= Ejk{Z^i)-F-\U))l^z,^_^^,z,,,]{F-HU)) 
<E.jk{Z^^-F-\U))\z,^_^^^z,,)]{F-HU)) 
= Epk{Z^,)-Y)^z,,_,^^zaY) 

k{Z(^)-y)dF{y). 



%-i) 

Here we use that F~^{u) G (Z(j_i), Zq)] <^=^ F~^{u) G (Zq„i), Zq)] and 
that for each u £ (0, 1), F~i(n) < F~^(ti) implying that ^(^(j) - F~\u)) < 
k{Z^,)-F-Hu)). 

To show existence of Fn, we only have to consider distribution functions 
having bend points at the observations and these can be parameterized as 
follows: 

n 

F = Y.r,Fz, 

with r E H = jr G : < < 1 for 1 < j < n and Y.'j=i "^j = l| • 

Expressed in terms of r, the log likelihood function becomes n~^x 
X)r=i los(X)j=i '^jS'^j (-^i))' which is a concave function that attains a finite 
value for some feasible r. Since H is compact, existence follows. □ 
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Remark 2.2. Existence and piecewise linearity with at most n changes 
of slope of Ffi can also be proved under the less-restrictive assumption that 
k should be initially nondecreasing on M, that is under the assumption that 
there exists a constant M G M such that k is nondecreasing on (— oo,M). 
In that situation we should allow Fn to have a point mass at zero. This 
implies that J^basis should be augmented with the function l[o,oo)- I^i this 
more general setting, the bend points of the MLE can be outside the set of 
observed data points. 

Theorem 2.3 (Characterization of the MLE). The (piecewise linear) 
distribution function F maximizes (5) over the class T if and only if 

(9) [^dGn{z)<l 



9f{z) 

for all 6 > 0. Here go is as defined in (7). In fact, equality holds for those 9 
that belong to the set of bend points of F. 

Proof. First necessity. Suppose F maximizes the log likelihood. Then, 
for all 6* > and e £ [0, 1] , 

(10) F + e{Fe-F)e:F^ lime-i(/„(F + e{Fg - F)) - /„(F)) < 0. 

Writing out this limit gives (9). That the inequality actually is an equality 
for those points where /^f({^}) > follows immediately upon noting that 
for those points F + e{Fg — F) £ also for small negative values of e. 

For sufficiency, let F = / Fq dfi^O) be an arbitrary (sub-)distribution func- 
tion in J^. Then, 

UF)-lrr{F)= flog^dGn{z)< j(^-l]dGn{z) 

J gF{z) J \gF{z) J 

f ge{z)dmdGn{z)-l 
9f{z) J 

J 9f[z) / 

2.2. Least squares. We now turn to an alternative nonparametric esti- 
mator for F, the least squares (LS) estimator. In order to define this es- 
timator we need a "type of inverse" for the kernel k. In Lemma 2.4 we 
will prove that under mild conditions there exists a function p, such that 
p * k{x) = id+{x) := x1[q^oo){x)- We now explain how we can use this result 
to define a least squares estimator. First note that 

rx 

p* g{x) = (p* k) * dF{x) = (id+ * dF) (x) = / F{u) du, 

Jo 
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which imphes that the survival function of the random variable X, defined 
by s = 1 — F, satisfies 

s{x) := U'{x) with U{x) := x — (p * g){x). 

Define an empirical estimate of U by 

Un{x) = X- {p*dGn)ix), 

and denote the class of survival functions associated with by 

oo) -.s is nonnegative, convex, decreasing and s(0) G (0, 1]}. 
We would like to define the LS estimator s„ by argmiusgs Qn('S), where 

2^ poo poo 

(11) Q^{s) = - s{xfdx- s{x)dUn{x). 

^ JO JO 

This definition is motivated by considering the L^-distance between s and 
(the nonexistent) U^. In the decomposition 

{s{x) - U^{x)f dx = f s{xfdx-2 f s{x)U^{x)dx+ f U'^{xfdx, 



the last term does not depend on s, and / s{x)U!^{x) dx is interpreted as 
/ s{x) dUn{x). Although not stated explicitly there, the isotonic inverse es- 
timator studied in Van Es, Jongbloed and Van Zuijlen (1998) can be in- 
terpreted in the same way as the LS estimator considered here. The only 
difference is that Qn is minimized over all decreasing rather than convex 
decreasing functions [0, oo). 

The main reason for considering the survival function s instead of the dis- 
tribution function F in the definition of the least square estimator is that the 
survival function is convex and decreasing and, henceforth, we can exploit 
results from Groeneboom, Jongbloed and Wellner (2001b) more naturally. 
We now provide conditions on existence of the reciprocal kernel p. 

Lemma 2.4. To each kernel function k € IC defined in (3), there corre- 
sponds a reciprocal kernel p {or "type 1 resolvent"), solving the first kind 
Volterra integral equation of convolution type 

PX 

(12) {p*k){x) := p{x-y)k{y)dy = xl[o^^){x). 

This function p is increasing, equals zero on (— oo,0) and satisfies p(0+) = 
1/A;(0+). Moreover, limt^oo t~^p{t) = 1- If, in addition, k is smooth in the 
sense that it can he written as 

PX POO 

(13) k{x) = k{0+)- K{y)dy= K{y)dy, 

JO Jx 
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for a Lipschitz continuous nonnegative function k on (0,oo), then the func- 
tion p admits a representation 




for a nonnegative continuous function £ on (0, oo) that is Lipschitz contin- 
uous on each bounded interval. 



Remark 2.5. For some kernels k £ IC, p is explicitly known. For ex- 
ample, p{t) = (1 + t)l[o^oo)(*) foi' the standard exponential k and p{t) = 
(1 + [tj)l[o,oo)(^) foi^ uniform(0, 1) kernel k. For other situations p can 
be easily approximated numerically using numerical integration procedures. 



Proof of Lemma 2.4. For the first part we refer to Van Es, Jongbloed 
and Van Zuijlen (1998) and Pipkin (1991), Chapter 6. For the result on 
smooth kernels, consider the Volterra convolution integral equation of the 
second kind 



(15) 



m 



K{t 



-£{u) du - 



K{t) 



fc(0+) "^•^'•-'^ k{<d+f 

and note that if £ solves this equation, p defined in (14) solves (12). Existence 
of a continuous solution to (15) is guaranteed by Theorem 3.5 in Gripenberg, 
Londen and Staffans (1990) because k is continuous. Using Lipschitz con- 
tinuity of K, Lipschitz continuity of I follows. Indeed, denote the Lipschitz 
constant of k by K, and let t E [0, M] and /i > sufficiently small. Then 



\t{t + h)-£{t)\ 



+ 



K{t — U + h) — K{t — u) 

M0+) 

K{t + h- u) 



< 



k{0+) 

I ^(0+) [0,M] 



(u) du -\- 



£{u) du 
K{t + h) 



1 -|- sup \k{u)\ I -|- 

[0,Af] / 



K 



h = CAjh. 



The result now follows from continuity of both 
interval [0,M]. □ 



A;(0+)2 
and K on the compact 



Assumption 2.6. Throughout the rest of the paper we will assume that 
k admits representation (13) with Lipschitz continuous nonnegative function 



K. 



Remark 2.7. Note that C/„ is a right-continuous function. The limit be- 
havior of p implies that Un{x) = o{x), as x ^ oo. It is obvious that Un{x) = x 
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for X G [0, -Z'(i)) and that C/„ has negative jumps of size ^p(O) at ah observa- 
tion points. 

There are two natural ways to define the least squares estimator. The 
first is to define it as the minimizer of Qn over the set S, as done above. A 
drawback of this approach is that additional assumptions on k are needed 
to show that the estimator Sn is well defined and to derive its asymptotic 
properties. We follow an alternative approach (avoiding these conditions) 
where we define the least squares estimator as the minimizer of Qn over the 
set 

Sn = {s:s convex and decreasing, 

(16) 

s(0) = 1, ■s(Z(„)) = 0, s piecewise linear with kinks only in Zn}. 

Theorem 2.8. The least squares estimator s„, defined as the minimizer 
of Qn over Sn, exists uniquely. 

Proof. Uniqueness is immediate from strict convexity of Qn- For ex- 
istence, note that any s £ Sn can be written as s = ^27=1 (^i^Zi, where sg = 
I — Fq [with Fg defined in (6)], all G [0,1] and X^ILi — ^- Hence, the 
minimization problem is equivalent to that of minimizing 

(ai, . . . , an) 1-^ - ^ ^ aiUj j sz.szj dx-^Ui / sz^ dUn 

i=lj=l i=l 

over the set C = {oj G [0, 1] (i = 1, . . . , n), J27=i = !}• The existence now 
follows from the compactness of C and the continuity of the mapping in the 
preceding display. □ 

Remark 2.9. The following argument shows why we can restrict the 
minimization to functions that equal one at zero. To show that 5^(0) = 1, 
note that the integral in objective function (11) can be split in the regions 

[0,Z(i)) and [Z(i),Z(„)]. The first part is ^ /g s{x){s{x) — 2)dx, where the 
convex integrand is minimized pointwisely by taking s{x) = 1. Hence, for 
any s G 5 with s(0) < 1, the objective function can be decreased by moving 
s on [0, Z(i)) as closely as possible to one. This boils down to changing it to 
the linear function connecting (0,1) with (Z(i), s(Z(i))). 

We now state necGSsary and. sufficiGnt conditions tha-t cha-ractGrizG Sn- 

Theorem 2.10. The function s minimizes Q n over all functions in Sm 
if and only if for all 9 G Zn 

rO rt / roo poo 

HniO;s)= / s{v)dvdt-e{ s{tf dt - s{t)dUn{t) 

Jt=0 Jv=0 \Jo Jo 
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(17) 



> / Un{t)dt = Yn{9), 

Jo 

with equality whenever 9 is a kink of s . 

Proof. For necessity, assume s minimizes Qn over Sn- Because s + 
e{sg — s) G Sn for all 9 £ Zn and e G [0, 1], and s minimizes Qn over Sn, we 
have that 

lime~\Qn{s + e{se - s)) - Q„(s)) > 0. 
Writing out this limit, we get 

s{x){sg{x) — s{x)) dx — / {sg{x) — s{x)) dUn{x) > y9 Zn- 



Denote, for the moment, by s the primitive of s, which is zero at zero. Then 
we have 

/ s{x)so{x)dx= / sg{x)ds{x) = - / s{x)dx 
Jo Jo 9 Jo 

and 

poo ]^ rO 

se{x) dUn{x) = - I Un{x)dx. 
9 Jo 

This leads to the necessary inequality for optimality given in (17). 

Now, for sufficiency, suppose s satisfies conditions (17). Let s = J sg djl{9) G 
Sn-, arbitrary. Define the function e ^ ip{e) := Qn{s + e{s — s)), which is con- 
vex on [0, 1]. Moreover, Q„(s) = > f {0) + ip'{0) = Qn{s) + (/j'(0), where 
the derivative is interpreted as right derivative. Hence, s minimizes Qn over 
Sn if V''(0) > 0. To see that this holds, note that 

(/.'(O) = / \{Hn{9; s) - Yn{9)) dfl{9) > 0. 
Je>o 9 

If we take s = s, then we obtain an equality in this display. This implies 
that, for all 9 where s has a kink, Hn{9;s) = Yn{9). □ 



Figures 1 and 2 show the maximum likelihood estimator and least squares 
estimator for the case that the "true" distribution function F equals F{x) = 
min(-y/a;/5, 1) (x > 0). In Figure 1 the noise is standard exponentially dis- 
tributed, and in Figure 2 the noise is sampled from the distribution with 
density k{x) = 2(1 — a;)l[o^i](x). The sample sizes were taken equal to 10 
and 100. The estimators were calculated using the algorithms described in 
Section 4. Figure 3 gives a plot corresponding to the left-hand side picture 
of Figure 1. It shows that the MLE and LSE satisfy the characterizations of 
Theorems 2.3 and 2.10, respectively. 
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2 4 6 2 4 6 

Fig. 1. Deconvolution with k{x) = e^^l[o,oo)(a;)- Left: n — 10. Right: n — 100. True: red 



dotted; MLE: blue solid: LSE: black dash-dotted. 

3. Consistency of the estimators. In Theorems 3.1 and 3.3 we prove 
consistency of the maximum likelihood and least squares estimators, respec- 
tively. 

3.1. Maximum likelihood. 

Theorem 3.1. Let k £ IC satisfy Assumption 2.6. Then, almost surely, 
\\Fn — -Fblloo — > 0. That is, the MLE is strongly uniformly consistent. In ad- 
dition, we have for all x > 

(18) F},{x) > limsupi:^(x) > liminf > F^ix). 

n — ^oo 

Here the superscripts "1" and "r" denote left and right derivatives, respec- 
tively. 



Proof. If F„ maximizes l^i over J~, then, by Theorem 2.3 




0246 0246 

Fig. 2. Deconvolution with k{x) — 2(1 — a::)l[o.i] (a;) . Left: n = 10. Right: n = 100. True: 
red dotted; MLE: blue solid; LSE: black dash-dotted. 
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5 10 



Fig. 3. DeconvoluUon with k{x) = e~^l[o_oo)(a;) (n = 10 ). The curves show that the char- 
acterization of the MLE and LSE, as given in Theorems 2.3 and 2.10, respectively, are 
satisfied. MLE: blue solid; LSE: black dash-dotted. 



(19) 

le{z) 



By the Glivenko-Cantelli theorem, if l^o := {||G„(-,a;) — Go||cx) — ^ 0}, where 
Go is the distribution function corresponding to gpg, then P(rio) = 1. Fix 

Choose an arbitrary subsequence (m) of (n). Using the Helly selection 
principle, a subsequence {I) of (m) and a concave subdistribution function 
F on [0,oo) can be extracted such that Fi{x) converges to F{x) for all 
X > 0. By Lemma A.l in the Appendix, this vague convergence implies for 
the corresponding convolution densities gi= gp^ and (sub) density g = gp 
that for all closed intervals I in (0,oo), sup^gj \9i{z) — g{z)\ ^ as i — > oo. 
Following exactly the argument of Theorem 3.2 in Groeneboom, Jongbloed 
and Wellner (2001b), it can be shown that necessarily gQ = g. 

Hence, any subsequence of the sequence {Fn}n has a further subsequence 
{Fi}i with for some F. Furthermore, we saw that g = gp = go = 9Fo ■ 

This implies F = Fq, so there is only one possible limit for the subsequence. 
Therefore, for all uo e Qq, Fn{co) Fq. Since Fq is concave, it is continuous. 
This implies that almost surely \\Fn — -FqIIoo — > 0, as n oo. The statement in 
(18) is a consequence of Marshall's lemma [Robertson, Wright and Dykstra 
(1988), page 332]. □ 



Remark 3.2. If we consider the more general setting mentioned in Re- 
mark 2.2, then the preceding argument can be extended to prove consistency 
for this case as well. 
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3.2. Least squares. Next we prove consistency for the least squares esti- 
mator. Let Uo{x) = Jq so{y)dy and define 

Qo{s) = - / s{x)'^dx- / s{x)dUo{x) 
^ Jo Jo 

^ roc ^ roo 

= n {s{x) - so{x)f dx - - soixfdx. 
'i Jo ^ Jo 



Theorem 3.3. Assume sqGS. If we denote the Lp'-norm of functions 
on [0, oo) hy \\ ■ \\2, then || Sn — soil 2 and || Sn — •sqUoo 0, as n —> oo. 

Proof. Note that 

5i C 52 C . . . C 5„ C . . . 5 C L^fo, cx)). 

For each i > 1, the set Si is closed with respect to the topology induced by 
the L'^-norm. This follows from the fact that s £ Si is bounded and piecewise 
linear, with kinks at at most i points. Furthermore, Si is convex. Therefore, 
the L^-projection of sq £ S onto Si exists. Denote the latter by HzSq. Using 
the fact that Sn minimizes Q„ over Sn, we get 

l|r ii2 

2 ll^n ~ •S0II2 

= Qo{Sn) + ^llsolli 

= Qo{Sn) - Qn{Sn) + Qn{Sn) + ^W^^Wl 

< Qo{Sn) - Qn{Sn) + Qn(nnSo) + ^||so||i 

= Qo{Sn) - Qn{Sn) + Qn(n„So) - QoO^nSo) + QoC^nSo) + ^\\so\\l 

< 2 sup \Qo{s) - Qn{s)\ + Qo(n„so) + ^llsolli 

= 2 sup \Qo{,s) - Qn{s)\ + ^||n„so - solli- 
On the other hand, we have that 

U^{x) - Unix) = r p{x - y) d{Gn - Go){y) 
Jo 

k{0) Jo 
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where the second equahty follows from equation (14). This implies that, for 



Qn{s) - Qq{s) 



-J^ + l^ siy^iy - x) dy^ d{Gn - Go){x). 



Substituting this equality in the preceding inequality gives 



\Sn ~ 'S0II2 

< 4 sup 



< 4 sup 

sG5 



Six 



+ s{y)l{y -x)dy] d{Gn - Go){x) 

m 

+ r <yyiy -^)dy] d{Gn - Go){x) 



+ IlllrtSo - S0II2 



Since U^^i Sn = S almost surely, 
as n — > 00. If the class 

s{x) 



+ ||n„so - soL- 



I n„ So — So II 2; tends to zero almost surely, 



X I 



+ 



s{y)i{y - x)dy, sgS 



is Glivenko-Cantelli, then the first term tends to zero as well. That this 
class is indeed Glivenko-Cantelli can be seen as follows. First, the class S 
itself is Glivenko-Cantelli [Example 3.7.1 in Van de Geer (2000)]. Moreover, 
{v : v{x) = /q°° s{x + y)i{y) dy, s £ S} C S is Glivenko-Cantelli for the same 
reason. Hence, by the triangle inequality, the class consisting of sums of two 
functions, one from each class, is Glivenko-Cantelli, too. 

Now suppose that s„ does not converge to sq pointwisely. Then there 
exists a point x > 0, and e > and a subsequence of n, such that for all n 
along this subsequence \snix) — sq{x)\ > e. Because all s„ and sq are convex 
and decreasing, there is a fixed neighborhood of x, such that for all y in 
this neighborhood and n along this subsequence, \sn{y) — so{y)\ > e/2. This 
implies that ||s„ — S0II2 does not converge to zero. Hence, with probability 
one Sn{x) — > sq{x) for all x, as n — > 00. Uniform consistency follows from this 
pointwise result because Sn and sq are convex and decreasing (the proof is 
similar to the proof of the classical Glivenko-Cantelli theorem). □ 

4. Computing the estimators by a support-reduction algorithm. Both 
estimators can be computed by the support-reduction algorithm as discussed 
in Groeneboom, Jongbloed and Wellner (2008) . This is an iterative algorithm 
for minimizing a convex objective function Q over a convex cone or convex 
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hull generated by a parametrized function class. Suppose the objective func- 
tion is denoted by Q, and let the convex cone T generated by the functions 
{/e:6lGe} be given by 

|/ /(x) = y feix) dfj,{9),fj, is a positive finite measure on Q^, 

where is some subset of M. (If we minimize over a convex hull, then the 
measure /i is a probability measure.) We aim to compute / = argminjgjFQ(/). 

Both the computation of the ML estimator and the LS estimator fit within 
this framework. For the MLE we minimize Q{f) = — flog f{x) dG nix) + 
/ fix) dx over the convex cone generated by the functions {gQ : 6 G 2n}; for 
the LSE we minimize Qif) = | / fix)"^ dx — fix) dUnix) over the convex 
hull generated by the functions {sg:9 G 2n}- If the solution is given by 

fn = I fe dfiniO), then Fn = J Fq dfiniO). 

The main steps of the algorithm are briefly explained in Section 6.1 of 
Jongbloed, van der Meulen and van der Vaart (2005) . For additional details 
we refer to Groeneboom, Jongbloed and Wellner (2008). Computational de- 
tails for the current setup can be found in the Appendix. 

5. Asymptotic lower bound on local minimax risk. In this section, we 
derive an asymptotic lower bound to a local minimax risk for estimating the 
concave distribution function Fq and its (decreasing) derivative Fq = /o at 
an interior point xq > of its support. On / we impose a local assumption 
near the point xq: 

foix) = foixo) + foixo)ix - xo)il + o(l)) 

(20) 

as X — > xo and /q is continuous at xq . 

Moreover, we assume an integrability condition on k and Fq jointly: 

/ X f°° k'ix-xo)"^ , 
(21) / — ^ —^dx<oo. 



•'xo 9 Foix) 

Define for a fixed kernel function k that can be expressed as in (13) the class 
of sampling densities 



Q = \g-9iz) = kiz - x)fix)dx, 

(22) 

.2 > with / decreasing density on (0, oo) 
Endow this class of densities with the Hellinger distance, 

l .oo . . xl/2 



Hig, h) = Q j^{4^)- ^)f dx^ 
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and denote by Gg a subset of Q containing a Hellinger ball of positive radius 
around the fixed density g £G. 

Now consider the problem of estimating the functionals 

(23) Ti{g) = F{xo) and T2{g) = fixo) 

based on a sample from density g. The difficulty of the problem of estimating 
a functional T{g) based on a sample of size n from the density g £G can be 
quantified using the concept of a local minimax risk: 

(24) R{n,T,gg)=mf sup Eg^n\tn{X) - T{g)\, 

where the infimum is taken over all estimators t„ based on the sample 
X = {Xi, . . . ,Xn). In Jongbloed (2000), an asymptotic lower bound to this 
quantity is given in terms of a (local) modulus of continuity nig of T over 

mg{e;T) = sup{\T{h) -T{g)\:hegg and H{h,g)<e}. 
In fact, if it can be shown that 

(25) mg{e;T)>{ceY{l + o{l)) as e j 0, 
then [Corollary 2 in Jongbloed (2000)] 

(26) liminfW2i?(n,r,C?„) > -e-^'/^ ( -cy/^Y . 

n^oo 4 \2 / 

Theorem 5.1. Let Ti and T2 be defined as in (23) and Q as in (22). 
Assume that condition (20) is satisfied for the density /o associated with go- 
Then, for the local minimax risk defined in (24), we have 



liminfn2/^i?(n,Ti,a„,,) > 

and 



l/ |/^(xo)|go(xo)2 \V5 
^"^-8^ 100e2fc(0)4 



hmmfn R{n,T2,Gg,) > j . 

Proof. We construct a family {ge'-£ G [0,eo]} C Q with the following 
properties: 

\Ti{ge) - Ti(go)| = ^e'/o(^o)(l + o(l)) and 

(27) 

\T2{ge)-T2{go)\=ef'o{xo){l + o{l)) 
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for e J, 0. Moreover, 

H{g,,go)<{cief'\l + o{l)) 

(28) 

^iJ(5j2/5/c^,5o) < e(l + o(l)) asejO, 

where 



Cl 

This means that for e J, 



V 550(2:0) 



m,{e;T,) > \n{g^vs/J - TMUl + o(l)) = ^-^^^ (1 + o(l)) 

and 



f/( \ 2/5 

,(go)|(l + o(l)) = ^°^^°^" (1 + 0(1)). 



mgie;T2) > jTs (5-^2/5 J - _ 

^1 

Using these facts in (25) and (26), the statement of the theorem fohows. 

Let us now define the class {g^ :e G [0,eo]} and prove (27) and (28). This 
class is defined based on a perturbation of the underlying distribution func- 
tion Fq. Indeed, 



ge{z) = I k{z- x)dF^{x) 

with 



Fe{x) 



Fq{x), iix^[xQ-Cee,XQ + £\ 

Fo{xo - CsE) + {x-Xo + CeE) 

X /o [xq -CeE), if X G [Xo - Ceff, Xq - e] 

Fo(2;o + e) + {x - xq - e)/o(xo + e), if x G (xo - e,xo + 



Here, is chosen in such a way that is continuous at xo — £• Note 
that Ce — > 3 as e I and is a concave distribution function on [0,oo), 
for all small values of £. By assumption (20), the statements in (27) follow 
immediately. A proof of (28) is given in the Appendix. □ 



6. Asymptotic distribution theory for the LS-estimator. Theorem 2.10 
gives a characterization of the least squares estimator that can be used to 
derive the limit behavior of the estimator at a fixed point. Let r„ C = 
{Zi, . . . , Zn} denote the set of bend points of 

In this section we prove the following result. 
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(29) 



Theorem 6.1. Suppose that sq is twice continuously differentiable in a 
neighborhood of xq, with strictly positive second derivative. Then, 

'n2/5ci(so, k)isn{xo) " so(xo)) \ / H"{0) \ 
ny'c2{so,k){sUxo)-s'o{xo))J^ \H"'{0)J- 

Here {H" (0), H'" (0)) are the second and third derivatives at zero of the in- 
velope H of the stochastic process 

Y{t)= fw{s)ds + t'^ 
Jo 

(where W is standard two-sided Brownian motion), introduced in Theorem 
2.1 of Groenehoom, Jonghloed and Wellner (2001a). The constants ci and 
C2 are given by 

( L.^ f^jm^Y^' , , / 24 \3/5/A;(0)2y/5 

\go{xors'(f{xo)J \So{xo)J \go{xo)/ 

Proof. Consider the processes 

(30) Hn{x) = J^ Sn{u)dudy - x(^j Sn{uf du- j Sn{u)dUn{u) 

and 

Yn{x)= rUn{y)dy. 

Jo 

By Theorem 2.10, the characterization of the LS estimator can be written 

as 

< Hn{x), for ah x G 



^"(^)\ = i7„(x), forahxeT^. 
Now define, for t G [—u^/^xq, oo), locahzed versions of Yn and Hn'. 
Yl°\t)=n^'\Yn{x^ + n~'/h) 

- Yn{xo) - n-^lHY:,{xo) - Jn-2/5t2^o(xo) - \n-^'Hh'^{xo)) 

I b 



C/„(^)-C/„(xo) 

{so{xo) + (ti - xo)so(^o)) du ) dv 



and 

B'^^t) = n^/\Hn{xo + n-^lh) - H^ixo) 

- n-'/hHUxo) - in-2/5t2^o(xo) - ^n-^/h^'oixo)) 
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where 

(31) An = n^/HHn{xo)-Yn{xo)) and 5„ = n3/5(//;(xo) - y^(xo)). 

By Lemma A. 2, the random variables An and Bn are tight. 

The necessary and sufficient conditions for optimality can then be rewrit- 
ten as 

' < Hl^%t), for aU t G [-n^/^xo, oo) with xq + n"V5t g z^, 
= Hl^^it), for all t with xq + n-^/H e T^. 
If we define the process Z„ by 

:= n^^HiUn - Uo){xo + n~^lh) - {U^ - Uo){xo)) 
then the process 1^°'^ can be rewritten as 

y^°^(t) = n^/s / {Univ) - Un{xo) - (Uoiv) - Uoixo))) dv 

JXQ 

+ n^'^ / {so{u) - so{xo) - {u- xo)s'Q{xo))dudv 

Jxo Jxo 



f' Zn{v)dv + ^s'^{xo)t^ + o{l), 

Jo ^4 



where for any c > the o(l) term is uniformly in t E [— c, c] as n tends to 
infinity. By Lemma A. 6 and the continuous mapping theorem, it now follows 
that 



Now we proceed by rescaling the axes in the necessary conditions for 
optimality in such a way that the limiting process behavior of 1^°'^ will no 
longer depend on the underlying functions sq and k. For any a,f3 > 0, the 
necessary and sufficient conditions can be rewritten as 

r > aY^"" {(3t)=: (t) , for all t G [c, c] , 

H':'{t) := aH':'{(3t) = ay^-(/3t) =: Y^^-{t), for all t G [-c,c] 

I with xo + n-^/^/?i GT„. 
In the limit, the right-hand side is given by 



VSW /■'*„„ , a/9* ^,4 



By Brownian scaling, that is, using that for each 7 > 0, ^/^W{■/^) is Brow- 
nian motion whenever W is, we get that in distribution this process is the 
same as 
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In order to get a process that does not depend on properties of go or sq , we 
choose a and f3 such that 



yielding 



a ■ 



S<^2))"V^y'= and ,= f«IV'^ 



"24 



Note that 

{Hln"iO) = a/32n2/5(s~„(xo) - so(xo)) = ci(so, k)n^/Hsn{^o) " ^oCa^o)) 
and 

iHl:T{0) = al3^n^'\Uxo) - 4(^o)) = 02(^0, k)n^l\U^o) - 4(xo)). 

From this point on, essentially the same reasoning can be followed as in 
the proof of Theorem 6.3 in Groeneboom Jongbloed and Wellner (2001b). 
Indeed, the necessary and sufficient conditions for optimality can be pushed 
to the limiting characterization related to the process studied in [Groene- 
boom, Jongbloed and Wellner (2001b), pages 1689-1690], where also Lemma 
A. 4 is needed to use their tightness argument. This leads to the convergence 
of the vector ((^^°'=)'"(0), (^i^°'=)"(0)), as described in (29). □ 

Remark 6.2. Because Sq = — /o by definition, the asymptotic standard 
deviations of Sn and s'^ coincide with the asymptotic bounds on the minimax 
risk given in Theorem 5.1, apart from some constants not depending on the 
underlying functions sq and k. 



APPENDIX 

Lemma A.l. Let Fn he a sequence of concave distribution functions on 
[0,00) converging to the concave (sub) distribution function F pointwisely on 
(0,00) (i.e., the corresponding sequence of distributions converges vaguely 
to the suhdistribution corresponding to F). Let k he a density on (0,oo) 
satisfying Assumption 2.6. Denote by gn and g the convolutions of k with 
Fn and F respectively. Then, gn converges to g uniformly on closed bounded 
intervals not containing 0. 

Proof. Denote for p = 1,2, . . . by k^^ compactly supported functions 
such that for all p, < k^^'^ < k and such that \\k — < 1/p. Choose 

arbitrary M > 0, and define ||(7||i,m = Jq \g{z) \ dz by the triangle inequality 

(32) \\gn - g\\i,M < \\gn - 9^^^ 111 + ll^i^^ - 9^^\\i,M + Il5 - 9^"'^ lli, 
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where gif^ = /c^^-* * dFn and g'^^^ = A;^^) * dF. Now, choose e > and take 
p > 3/e. For the last term in (32) we have, via Fubini, 

/•OO PZ 

||5-g(p)||^= / / {k{z-x)-k^P\z-x))dF{x)dz 
Jo Jo 

< \\k-k^P'>\\^ < l/p<e/3. 

The first term in (32) is smaher than e/3 for the same reason. By the as- 
sumed vague convergence, we have for ah z, Ign ^z) — g^P'^{z)\ — > because 
k^^ is bounded, continuous and has bounded support. Because g^'P\z) < 

g{z) < fe(0+), ll^l^^ — 5^^^||i,M < e/3 for n sufficiently large by dominated 
convergence. Now, consider for r/ > 1 an interval [1 /rj, rj] . Note that on this 
interval the densities of F^ and F necessarily take values in the interval 
[0,r/]. This means that all gn and g are Lipschitz continuous with constant 
ll'^lloo +k{0)r]: 

\g{z + h)- g{z)\ < f \k{z + h - x) - k{z - x)\ dF{x) 
Jo 

rz+h 

+ / k{z + h — x)f{x) dx 



<h{\\K\\^ + k{0)7]). 

This, together with the || • convergence, implies the uniform convergence 
on [l/r/,r/]. □ 

Computational details for the maximum likelihood estimator. We aim 

to minimize 



ln{g) = - j logg{x)dGn{x) + J g{x)dx 

over the set 

G '■= \ g:g{x) = / go{x) d^{6), fi is a positive finite measure >. 

I J[0,oo) ) 

The addition of the / g{x) dx-term in the objective function enables us to 
minimize over a convex cone instead of a convex hull, since the minimizer 
of In can in fact be shown to be a probability density. By Theorem 2.1, it 
suffices to consider measures supported on 2^1- 

As shown in Section 7 of Groeneboom, Jongbloed and Wellner (2008), 
given a current iterate g, instead of In, we can minimize the local objective 
function 



22 G. JONGBLOED AND F. H. VAN DER MEULEN 

which is a local quadratic approximation of the objective function near g. 
This quadratic function can be minimized over the (finitely generated) cone 
using the support reduction algorithm, yielding 

gq = argmin{/.„(g; g):ge cone{g0 : 6 G Z„)}. 

The next iterate is then obtained as ^ + \{gq — g) (A chosen appropriately 
to assure monotonicity of the algorithm). 

We now turn to the details of the support reduction algorithm. To find a 
new support point (a direction of descent), we first compute 

Lig + egeig) - ln{g;g) = y^c2{9) +eci{e;g). 

Here, 

ci{e;g) = l-2 1 ^{x)dGn{x)+ J ^(x)dGn(x), 

C2{0)= J ^dGn{x). 

Computations that are completely analogous to those of Section 4 in Groene- 
boom, Jongbloed and Wellner (2008), then show that the most promising 
direction is given by 

(33) 6' = argmm^==:. 

eez„ ^/C2[0) 

The second step consists of minimizing ln{J2iLi o^igei', g) cxi,...,am 
(without restrictions on a^). Now 

Cm \ m y „ ^ 

^aigg^;g\ =^ai(l -2 -^{x)dGn{x)] 
i=l ) i=\ ^9 / 

m m r ge ge 
+ 2^^"*"W -^^ix)dGn{x). 

i=l j=l " 

Differentiating with respect to Qj yields the linear system of equation A{ai, . . . 
am)' = b, where 

Ai,j = f ^^{x) dGn{x), hi = -l + 2 f ^{x) dGn{x). 
J g^ J g 

Computational details for the least squares estimator. The least squares 
estimator is defined as the minimizer of 

roc roo 
Qn{s) = - / s{x)'^dx- / s{x)dUn{x) 

^ Jo Jo 
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over the set 5„ as defined in (16). If s G 5„, , then s{x) = sg{x) dn{9), 
where sg{x) = (l — x/O)^ and /z is a probability measure supported on Zn- In 
the following, we denote {f,g) = J f{x)g{x) dx and {f,dUn) = J f{x) dUn{x). 

In the first step of the support reduction algorithm we look for a direction 
of descent. Given an iterate s, the directional derivative in the direction of 
sg is given by 

ci{6;s) = lime~^ {Qn{s + esg) - Q„,(s)) = {s,sg) - {sg,dUn). 

elO 

The new support point is given by 9 = aTgmmg^z^ci{9;s). By Theorem 
2.10, the optimal solution s satisfies ci{9;s) > {s,s) — {s,dUn)- 

The second step of the algorithm consists of minimizing Qn{J2iLi'^iS0i) 
over all a^, such that J^ILi = 1. If m = 1, we simply have ai = l. Else, we 
set ai = 1 — ^27^2(^1 ^'iid minimize over 02, . . . ,am. (without restrictions). 
We can write 

Cm \ -j^ m m m 

^o^iSe, = i^^^aiaj{sg^,sgj) -^ai{sg^,dUn) 
i=l I i=lj=l i=\ 

Y rn m m 

1=2 i=2 j=2 

m 

- ai{sg^,dUn) - ^ai{sg^,dUn). 

i=2 

Differentiating with respect to (i = 2, . . . , m), yields the linear system of 
equations A[a2-, ■ ■ ■ , Om)' = b, where 

= (s^i - Sg^,Sg^ - Sg^), i,j = 2,...,m, 

and 

bi_l = (S6»i - Sg^ , Sg^ - dUn), i = 2,...,m. 

Proof of (28). For ease of notation we shall omit subscripts on / and g in 
the proof. Thus, we write / instead of /q. We use Lemma 2 from Jongbloed 
(2000), which states that 

H\gs,g) ^ I (a^(-)-f^))' dx = 4^) + IP + If) as . i 0, 

8J{x:g{x)>0} g{x) 

where iP , iP and iP are defined as the integral over the regions [xq — Cge, 
xo — e], (xo — e, xo + s\ and (xq + e, 00) respectively. Note that, for all x > 0, 

rX(i-£ 

g{x) -9e{x) = / k{x-u){f{u) - /(xo - Cee))du 

rXQ+e 

+ / k{x-u){f{u) - f{xo + e))du 

J xo—e 
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and that, for x < xq — CeS, this difference is zero, since k{x) = for x < 0. 
For X G [xo — CeE, xo — e], we have that ^(x) — ge{x) = /^p_c^e k{x — u){f{u) — 
/(xo — Cge)) du. Since k satisfies (13), sup^^(^^^_^^^^^-^ \k{x — u) — k{0)\ = o(l) 
as e I 0. Furthermore, condition (20) impUes 

/(n) - /(xo - c^e) = (n - Xo + Cee)/'(^), 

^£{xo- CeE, u) C (xo - CeE, Xo - e). 

If e I 0, then ^ — > xq and /'{(,) f'{xo), since /' is continuous at xo- Hence, 
g{x) - ge{x) = I {k{0) + o(l))(n - xo + c,e)(/'(xo) + o(l)) du 



(34) = -A;(0)/'(xo)[(« - xo + c,e)2]-^_,^,(l + o(l)) 



h{0)f'{xo)ix - xo + Cesfil + o(l)). 



Hence, 



" 8 Jx'o-c.e 5(2;) 

fc(0)V^(^o)' p-e (X-X0 + C,£)\ ^ 

_ fc(0)V'(xo)^5, 



55(3^0) 

For X G (xq — e, xq + e), 



-£^(1+0(1)). 



9{x) - ge{x) = I k{x- u){f{u) - /(xQ - c^e)) du 



+ k{x - u){f (u) - f {xq + e)) du. 

Jxo~e 

In exactly the same manner as the previous case, we can find asymptotic 
order relations for this expression. For the first term we get, from (34), 

/ k{x - u){f{u) - /(xo - CeE)) du = 2k{0) f ixo)e\l + 0(1)). 

For the second term we get 



k{x - u){f{u) - /(xq + e)) du 



1 



^fc(0)/'(xo)[(x - xo - ef - 4e\l + o(l)). 
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This gives g{x) — ge{x) = ^k{0)f'{xo){x — xq — e)^(l + o(l)), and thus 

' "8 4-. 9ix) 5g(xo) ^^^ + °^^^^- 

Now take x > xq + e. Then we can write 

g{x)-ge{x) 

rxo-e 

k{x - u){f{u) - f{xo - CeS)) du 

xo—c^e 

+ I k{x- u){f{u) - f{xo + e)) du 

XQ+e 

{k{x-u)[f{u)-f{xo + e)] 

xo-e 

+ k{x-u + {ce - l)e) [/(n - (c^ - l)e) - /(xq - c^e)]} du. 
Next, we use relations like 

/(n) - f{xo + = (n - xo - e)/'(xo)(l + o(l)) 

and 

k{x — u) = k{x — Xq) + {xq — u)k'{x — Xq){1 + o(l)) 

to obtain 

9{x)-ge{x) 

rXQ+e 

= k'{x - xo)f'{xo) / {{xq- u){u - xo - e) H 

Jxa-e 

+ {u - Xq + e) 

X (xo - M + (ce - l)e)}du (1 + o(l)) 
= h'{x-xo)f'{xo)e^{l + o{l)). 

Now 



by (21) 



H\g.,g) ~ ^^^^^^^'(1 + oil)) as . i 0. 
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Technical results for deriving the asymptotic distribution. In what fol- 
lows we assume, as in Theorem 6.1, that sq is twice continuously differ- 
entiable in a neighborhood of xq, with strictly positive second derivative. 



Lemma A. 2. The random variables An and Bn as defined in (31) are 
tight. 



To be able to prove the lemma, we first need to prove several other lemmas. 



Distance between successive bend points of the estimator. Recall that 
Tn denotes the set of bend-points of s„. For a sequence ^„ converging to xq, 
define the bend points to the left and right of ^„ by 

(35) T~ = maxjx G r„ : X < and = minjx G r„ : x > 

By consistency and the local assumption of strict convexity of sq in a neigh- 
borhood of Xq, it follows that — t~ as n ^ oo. The lemma below 
strengthens this to a rate result for — t~ that is used to obtain a rate 
result for the LS estimator itself. 



Lemma A. 3. Let ^„ be a sequence converging to xq. Let and be 
defined according to (35) Then, 

T^-T-=Op{n-^'^). 



Proof. Define, for u <v, the v-shaped functions connecting the points 
(n, 1), ((u + f )/2, — 1), and (w,l), also used in Mammen (1991): 



V — u 



u + V 



X ■ 



1 



Note that 



(36) 



fu,vix) dx= xfu,y{x) dx = and 



x'^fuA^) dx = {v- uf/24. 



Now, take u = t^ and u = r + and define the function fu,v as follows. First, 
set /n,,;(0) =0. For x = Zi, . . . , let /„,^(x) := fu,v{Zi). In between these 
points define fu,v by linear interpolation. For x > Z(„), fu,v{x) = 0. Note 
that fu,v and fu,v only differ on the spacings containing u, {u + v)/2 and v. 
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Using (36) and that the maximal distance between successive order statistics 
is Op{n~^logn), it follows that 

(37) J fu,v{x)dx = Op(^-^^^^, J xfu,v{x)dx = Op(^-^^^ 
and 

(38) J x'fu,,{x) dx = {v- uf/24 + Op (^-^ 

Observe that, for small positive e, the function s„ + efu,v £ Sn- This implies 
that 

lirne''^ {Q{sn + efu,v) - Q{sn)) > 

hence / Snix)fu,vix) dx - J fu,vix) dUn{x) > 0. 

Note that, by (37) and the fact that s„ is linear on [u,v\, the first term is 
Op{n~^\ogn). Hence, 

(39) j fuAx)d{Un-Uo){x) + J f^ ,^(x)dUoix)<Op(^^y 

Using that Uq = sq and using a Taylor expansion for sq as well as (37) and 
(38), we can write for the second term in (39) 

fuA^) dUoix) = ^s'i{xo){v - uf + Op (^^^ + o{{v - uf) 



yielding 

(40) 



fu,v{x) d{Un - Uo){x) + ^4{X0){V - uf 



<Op 



(i5|^)+o((„-«f). 



For the first term in (40), we have 



+ / -/ \{Un-UQ){x)dx 



log re 



n 



V - U Uu J(u+v)/2 

= j ^u,v{^) dC^n - Go)(x) + Op 

using the notation p{x) = Jq p{y) dy, 
V^u,vix) = p{u - x) - p{v - x) 

■ ^p(n - x) - 2p^^^-^ - x^ + p(i; - x)^ 



V — u 



28 



G. JONGBLOED AND F. H. VAN DER MEULEN 



We now show that, for any e > 0, by taking A> sufficiently large, 
(41) 



Go){x) 



>e{v-uf + An ^/^^ 



can be made arbitrarily small, uniformly in n. To this end, define for i,j £ 
Kn = {l,2,...,\n^/^6]} the sets 

li = iCn - in-^'\in -{i- l)n-i/5] and 

Jj = {in + {j-l)n-^'\in+jn-^"'] 

and note that the class of functions J^ij- = {(pu,v - u £ Ii,j G Jj} is a VC class 
with envelope 

{c{j + i)n~^/^, for x G [0,^^ - m"^/^), 
c, for xG [Cn-m-i/5,^„+ jn-^S], 

0, for X > ^„ +jn~^/^, 

where c > is a constant. For deriving this envelope function, we use relation 
(14) and the Lipschitz continuity of I. For y <u, 

4 

\^u,v{y)\ < V\\oo{v -u)-\ \p{Cu,v,y){v - u) /2 - p{Uu,v,y){v - u)/2\ 



V — U 



< 



^{v-u) + 2\\e\\ooWu,v,y - Cu,v,y\ < 3||£||oo(^' " u). 

Taking into account that, for u G Ij and j £ Jj, 0<v — u<{i + j)n~^^^ , we 
get the first inequality in (42). The other bounds in (42) can be deduced 
similarly. 

For the probability in (41) we can now write 



P[3i,j eKn:3ueIi, 



{x)d{Gn - Go){x) 



<P[3i,jeKn:3ueIi. 



v€J. 



3 ■ 



n^/' / ^n,.(x)d(G„-Go)(x) 



> e(j + i-2f + A 



< P( 3i, j G Kn : sup 

UGli,V(iJj 



n'/' / ^u,v{x)d{Gn-Go){x) 



> e(j + i-2f + A 
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< E E ^ sup 



> e{j + i-2f + A 



n 



1/5 



(e(i + i-2)3 + A)2 







I sup 


\U£li,V£jj 





n^^'^ / (pu,vix)d{Gn - Go){x) 



<E E 

n 



To bound the expectation in the summand in this expression, we can then 
use Theorem 2.14.1 in van der Vaart and Wellner (1996), with envelope 
function (42), yielding, for some positive c, 

r \ 2 

n^/^ / ipu,vix)d{Gn-Go){x) 







I sup 







< c{{i + i)rr^l^ + (i + ifn-'^l^). 
This gives, as upper bound for probabihty (41), 

^^ c((i + j) + (i+j)2n-V5) _ ^ A:(/t-l) + /fc2(fc_i)n-V5 



k=2 



(e(A;- 2)3 + ^)2 



which, by dominated convergence, can be made arbitrarily small by taking 
A sufficiently large. 

Combining this result with inequality (40), taking e = Sq(xo)/96, we ob- 
tain that by taking A sufficiently large, we have with arbitrarily high prob- 
ability that 



n3/54'(xo) 



48 



(r+-r-)3<n3/5 



(p^- ^+{x)d{Gn - Go){x) 



+ 0p 



logn 

j^2/5 



<^^^^^(r+-r-)3 + ^ + 0p^l°S" 



96 



n 



2/5 



implying that r+ — r„ =Op{n ^/^). □ 

Rate results for the estimator. The next lemma shows that, in Op(n~^/^) 
neighborhoods of xq, the minimal value of the difference between s„ and sq 
over this neigborhood is Op(n~2/5). 

Lemma A. 4. Let ^„ be a sequence converging to xq. For any e > there 
exist an M > 1 and a c > 0, such that the following holds with probability 



30 G. JONGBLOED AND F. H. VAN DER MEULEN 

greater than 1 — e. There are bend points t ~ < ^„ < of Sn with n^^^ < 
— T,^ < Mn^/^ , and for any such points we have 

inf \sn{t) - so(i)l < cn~^/^ for all 



n. 



Proof. Applying Lemma A. 3 to the sequences ^„ i n"^/^ implies that 
for any e > we can find an M > 1, such that with probability greater than 
1 — e there are bend points of Sn satisfying ^„ — Mn"-"^/^ < t",^ ^ — n~^/^ < 

Now, fix e > and define the M and accordingly. Define the functions 

and Lp^n^ by 

'PnXx) = {T+ -x)l^^-^^+^{x) and ip^^\x) = {t^ - x)l^^~ ^^+^{x) 

and note that, for e > sufficiently small, the piecewise linear functions 
defined by + eipn\zi) and — e^pn\zi) (and linear interpolation 

between observation points) belong to the class 5„. Hence, 

\im£-\Q{sn + e<p^^^) - Q{~Sn)) > 0. 

elO 

This implies, taking into account issues related to piecewise linearity of the 
function via the Op(n~^logn) term. 



(43) r (r+ - x)s„(x) dx- f (t+ - x) dUn{x) >0p( 
Similarly, taking —£ipn^ instead of eipn'^ , we obtain 



logn 



n 



(44) / (t+ - xYSnix) dx - I ^ (t+ - X) dUn{x) < 0. 

From (43) and (44) we obtain 

(r+ - x){sn{x) - so{x)) dx - i ^ (r+ - x) d{Un - Uo){x) 



(45) 

flogn 

= Up 



.+1 



n 

Now, suppose that 

(46) inf \snix) - so{x)\ > cn'"^/^. 

3;e[T~,T+] 

Then 



■x){Sn{x) -Soix))dx 



> c(t+ - t„ ) n 



2„-2/5 



ESTIMATING A CONCAVE DISTRIBUTION FUNCTION 
which, in view of (45), imphes (using that — t~ > n~^/^) 



31 



(47) 



{t+ -x)d{Un-Uo){x) 



-4/5 



Also, note that 

{t+ -x)d{Un-Uo){x) 



{Un-Uo){x)-{Un-Uo){T-)dx 

= Op(n-4/5) 

by Lemmas A. 3 and A. 6. Hence, the probabihty of (46) is smaller than 
or equal to that of (47), which can be made arbitrarily small by taking c 
sufficiently large. □ 



Lemma A. 5. For each M > 0, 

sup \sn{xo + n-^/^t) - so(xo) - n-^/^tso(xo)| = Op{n-^/^) 
te[-M,M] 

and 

sup \s^{xo + n"^/^t) - Sq{xo)\ = Op{n'^/^). 

te[-M,M] 

Proof. This follows from Lemmas A. 4 and A. 3 in the same way Lemma 4.4 
follows from Lemmas 4.3 and 4.2 in Groeneboom, Jongbloed and Wellner 
(2001b). □ 



Proof of Lemma A. 2. Note that the characterization of s„ in Theo- 
rem 2.10 implies that, for all bend points Tn of Sn, 

(48) Hn{Tn)=Yn{Tn) and HUtu) =Y:,{Tn) + Op 

where the derivative of Yn is to be interpreted as a right derivative. Choose 
Tn, the last bend point of Sn before xq. First, consider Bn and observe that 

Bn = n^'HKixo) - K{Tn) + Y'STu) " Yl^ix^) + H'^ijn) " ^^(t^)) 

= n^/s { r sn{u) du - (C/o(xo) - C/o(rn)) 

- n3/5((C/„ - C/o)(xo) - {Un - Uo){Tn))+n^/HHn{rn) - ^^(Tn)). 

By (48), the last term is Op(n~^/^ logn). By Lemmas A. 6 and A. 3, the 
second term is Op(l). To see that the first term is Op(l) as well, we use a 
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Taylor expansion of Uo{x) = Jq so(y) dy around xq, 

Uo{xo) - Uo{Tn) = {xo - T„)[/o(xo) - ^{xq - r„)^?7o (xq) 

^0 In 

(Soixo) + (U - Xo)Sq{xo)) du +-{xo- r„) So(^ri) 

for G {TmXo). Inserting this into the first term gives, for n sufficiently 
large, 



„3/5 



xo 

Sn{u) du - {Uo{xo) - Uo{Tn)) 



^3/5 



[sniu) - so{xo) - {u- xo)sq{xo)] du - ^n^/^(xo - Tnfs'^{^„ 



<n^^^{xo -Tn) sup \Sn{u)-So{xo)-{u-Xo)s'Q{xo)\ 
m6[t„,xo] 

+ 1^3/5 4'(xo)(xo-r„)^ 

= Op(l) 

by Lemmas A. 3 and A. 5. 
Now, for An we get 

An = n^/'{Hn{xo) - Hn{Tn) " (xq " T„)i7;(T„) 

- (y„(xo) - Yn{Tn) - {xo - T„)y^(r„))} 
-n'/^{{xo-Tnm{Tn)-K{rn))}. 

By (48) the second term is Op(n~-'^/^ logn). Note that 

rxo ry 

Hnixo) - Hn{Tn) - (xq - r„)i?^(r„) = / / s„(u) dudy 



and 



rxo 

Yn{xo) - Yn{Tn) - {xq - Tn)Y^{Tn) = / (Uniu) - C/„(r„)) du. 



Tn 



Therefore, the first term can be written as 



^4/5 r° f r _ ^^^^y^ _ [/^(^^)'\ dy. 

J y=Tn \J U=Tn J 
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Adding and subtracting n^/^ j!^^ {Uoiv) — Uo{Tn)) dy = n^l^ sq{u) du dy, 

this expression can in turn be written as 

rxo ny 

/ / (sniu) - soiu))dudy 

- n^/' / {Un{u) - Uo{u) - ([7„(r„) - [/o(t„))) du. 

Using a second-order Taylor expansion of sq around xq, this expression can 
be seen to equal 

r/'^ j I {sn{u) - soixo) - {u- xo)sQ{xo))dudy 

- r\Un{u) - Uo{u) - (C/„(r„) - Uo{Tn)))du 

--n^/M / iu-xofs'^{Cn)dudy = Op{l), 
with ^„ G (r„,xo), by Lemmas A. 3, A. 5 and A. 6. □ 

Lemma A. 6. Assume the kernel k satisfies Assumption 2.6. Then 



Zn{t) := r?'\{Un - Uo){xo + n-'^H) - - Uo){xo)) ^ ^^g^W^(t), 

in the space oo,oo) endowed with the topology of uniform convergence 
on compacta. Here, W denotes a two-sided standard Wiener process. 

Proof. By equation (14), we can write 

Un{x) = Vn{x) - . <Gn{x) with Vn{x) = X - / - s)i{s) ds. 

k[U+) Jo 
Define Vq analogously, replacing Gn by Gq. It is easy to see that 

„3/5 /.zo+n-i/Sj 

(49) Zn{t)=Zi^\t)-^J^^ diGn-Go){x), 

where 

Z«(t) =n3/5((y„ - Vo){xo + n~'/h) - (K - Vo){xo)). 

The last term on the right-hand side of (49) converges to the two-sided 
Wiener process as indicated in the statement of the lemma. For the first 
term, we can write 

Zi'\t) = ^ (Go{y) - Gnivmxo + n-^'h - y) dy 
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- (Go(y)-G„(y)K(rEo-y)(iyj 
= n^"' (Go(y) - Gniy))iiixo + n-'^h - y) - £(xo - y)) dy 
{Go{y) - GniyMxo + n-'/^ -y)dy). 

Hence, for any M > 0, we get for n sufficiently large that 
sup \Z^^\t)\ < - GollooCMn"^/^ + 2n-^/^M\\G: 

\t\<M 

= Op(n-ViO). 
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